Systems and methods for workload distribution across processing units

ABSTRACT

Workload distribution in a system including a non-volatile memory device is disclosed. A request is received including an address associated with a memory location of the non-volatile memory device. A hash value is calculated based on the address. A list of node values is searched, and one of the node values in the list is identified based on the hash value. A processor is identified based on the one of the node values, and the address is stored in association with the processor. The request is transmitted to the processor for accessing the memory location.

CROSS-REFERENCE TO RELATED APPLICATION(S)

The present application claims priority to and the benefit of U.S.Provisional Application No. 63/247,698, filed Sep. 23, 2021, entitled“METHOD FOR WORKLOAD BALANCING ACROSS FCORES USING CONSISTENT HASHING,”the entire content of which is incorporated herein by reference.

FIELD

One or more aspects of embodiments according to the present disclosurerelate to computational systems, and more particularly to distributingworkloads to processing units of the computational systems.

BACKGROUND

Flash memory may be used as a non-volatile memory system to storedifferent types of data including voice and image data. A memory card,such as a solid state drive (SSD), is an example of a storage deviceusing a flash memory system.

A central processing unit (CPU) may be part of the flash memory systemfor managing input/output (I/O) operations of the flash memory. Whenthere are multiple CPUs in the flash memory system, one of the CPUs isselected to handle the I/O request.

The above information disclosed in this Background section is only forenhancement of understanding of the background of the presentdisclosure, and therefore, it may contain information that does not formprior art.

SUMMARY

An embodiment of the present disclosure is directed to a method forworkload distribution in a system having a non-volatile memory device.The method includes receiving a request including an address associatedwith a memory location of the non-volatile memory device; calculating ahash value based on the address; searching a list of node values;identifying one of the node values in the list based on the hash value;identifying a processor based on the one of the node values; storing theaddress in association with the processor; and transmitting the requestto the processor for accessing the memory location.

According to one embodiment, the request includes a request to writedata to the memory location.

According to one embodiment, the address includes a logical address.

According to one embodiment, the identifying of one of the node valuesis based on consistent hashing.

According to one embodiment, the list of node values represents aconsistent hash ring.

According to one embodiment, the list of node values is a sorted list,wherein the identifying of the one of the node values includesidentifying a first one of the node values in the list with a valueequal or greater to the hash value.

According to one embodiment, the identifying the processor includessearching a table including a mapping of the node values to processoridentifiers.

According to one embodiment, a processor identifier for the processor isstored in a bitmap in association with the address.

According to one embodiment, the processor includes an embeddedprocessor, and the memory location identifies a location of a flashmemory device.

According to one embodiment, the method further includes: identifying atrigger condition associated with the processor; identifying a nodeassociated with the processor, wherein the node is identified via one ofthe node values; identifying an address range covered by the node;associating at least a portion of the address range to a second node;and mapping the second node to a second processor different from theprocessor.

An embodiment of the present disclosure is also directed to anon-volatile memory system, comprising: a non-volatile memory device; afirst processor coupled to the non-volatile memory device; and a secondprocessor coupled to the first processor. The second processor may beconfigured to: receive a request including an address associated with amemory location of the non-volatile memory device; calculate a hashvalue based on the location identifier; map the hash value onto a listof node values; identify one of the node values in the list based on thehash value; identify the first processor based on the one of the nodevalues; store the location identifier in association with the firstprocessor; and transmit the request to the first processor for accessingthe memory location.

As a person of skill in the art should recognize, use of hashing (e.g.consistent hashing) to identify a node value of a virtual CPU that isfurther mapped to a physical CPU may allow balanced workloaddistribution to multiple physical CPUs in a non-volatile memory system.In addition, the ability to add (or delete) new node values associatedwith the virtual CPUs helps the system minimize the negative effects ofdie failures or other drive condition changes.

These and other features, aspects and advantages of the embodiments ofthe present disclosure will be more fully understood when consideredwith respect to the following detailed description, appended claims, andaccompanying drawings. Of course, the actual scope of the invention isdefined by the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments of the present embodimentsare described with reference to the following figures, wherein likereference numerals refer to like parts throughout the various viewsunless otherwise specified.

FIG. 1 is a block diagram of a system for distributing data requestsaccording to one embodiment;

FIG. 2A is a conceptual block diagrams of an example hash ring used withconsistent hashing according to one embodiment;

FIG. 2B is a conceptual block diagram of the hash ring of FIG. 2A, thatadds an additional node to the ring according to one embodiment;

FIG. 2C is a conceptual block diagram of the hash ring of FIG. 2A, thatremoves a node from the ring according to one embodiment;

FIG. 3A is an exemplary mapping table of virtual CPUs to physical CPUsaccording to one embodiment;

FIG. 3B is the mapping table of FIG. 3A, that adds an entry for a newvirtual CPU according to one embodiment;

FIG. 4 is a conceptual block diagram of an example bitmap according toone embodiment;

FIG. 5 is a flow diagram of a process for processing a first accessrequest according to one embodiment;

FIG. 6 is a flow diagram of a process for processing a second accessrequest according to one embodiment; and

FIG. 7 is a flow diagram of a process for adding a virtual node to ahash ring according to one embodiment.

DETAILED DESCRIPTION

Hereinafter, example embodiments will be described in more detail withreference to the accompanying drawings, in which like reference numbersrefer to like elements throughout. The present disclosure, however, maybe embodied in various different forms, and should not be construed asbeing limited to only the illustrated embodiments herein. Rather, theseembodiments are provided as examples so that this disclosure will bethorough and complete, and will fully convey the aspects and features ofthe present disclosure to those skilled in the art. Accordingly,processes, elements, and techniques that are not necessary to thosehaving ordinary skill in the art for a complete understanding of theaspects and features of the present disclosure may not be described.Unless otherwise noted, like reference numerals denote like elementsthroughout the attached drawings and the written description, and thus,descriptions thereof may not be repeated. Further, in the drawings, therelative sizes of elements, layers, and regions may be exaggeratedand/or simplified for clarity.

A central processing unit (CPU) may be part of the flash memory systemfor managing input/output (I/O) operations of the flash memory. Whenthere are multiple CPUs in the flash memory system, it may be desirableto distribute the I/O requests to the multiple CPUs to achieve balancedworkload distribution

When a flash memory system includes various embedded central processingunits (CPUs), one traditional mechanism for identifying one of the CPUsto process an I/O request may be, for example, via a mathematicalcalculation. For example, a modulo operation may be performed based onthe logical block address (LBA) of the I/O request and the total numberof CPUs in the flash memory system (e.g. by taking the LBA andperforming a modulus operation based on a total CPUs). Such a mathoperation may operate to distribute the workloads among the embeddedCPUs in a manner similar to round robin.

One problem with the use of a mathematical calculation like the oneabove, is that it may cause performance degradation on random writeinstructions in the event of hardware failure of the flash memorysystem. For example, upon die failure, data on the failed die istypically remapped to an over provisioning (OP) space for thecorresponding CPU. This may result in more garbage collection (GC)activities and less over provisioning (OP) space, which may degradeperformance of the CPU. For example, an approximately 1.5% OP spacereduction may cause more than an approximately 10% performance drop onthe associated CPU. As random write performance may be bounded by theslowest CPU, the CPU may become the bottleneck for the flash memorysystem. In addition, more writes caused by GC activities on the CPU maycause issues like wear-leveling and potential higher failure ratio,resulting in even lower performance of the CPU. Thus, it is desirable tohave a system and method that provides workload balance across variousembedded CPUs while efficiently addressing issues such as die failures.

Embodiments of the present disclosure are directed to using consistenthashing for identifying virtual CPUs (also referred to as “nodes”) thatare mapped to physical CPUs, for balancing write requests to the flashmemory system. In general terms consistent hashing is a distributedhashing scheme that does not depend on a number of nodes in adistributed hash table. In this regard, a hash ring is employed wherethe output range of a hash function is treated as a fixed circular spacereferred to as the hash ring, with the largest hash value wrappingaround to the smallest hash value. In one embodiment, a requestdistribution engine employs consistent hashing to map the LBA of a writerequest to a virtual CPU, and further identifies the physical CPU thatis mapped to the virtual CPU. The write request may be forwarded to theidentified physical CPU for handling.

In one embodiment, the physical CPU that is identified for the LBA maybe stored in a bitmap or in any suitable format. The CPU informationstored in the bitmap may be used to process read requests. In thisregard, the request processing engine identifies an LBA in a readrequest, and retrieves the bitmap for determining the CPU that is mappedto the LBA. The read request may be sent to the identified CPU forhandling.

In one embodiment, the request distribution engine adds or reducesvirtual nodes for a physical CPU based on a sensed condition. Oneexample condition may be failure of a flash memory die. In the event ofmemory die failure, a virtual node may be added to reduce traffic to thephysical CPU with the failed die. The added virtual node may be mappedto a physical CPU other than the CPU with the failed die. In oneembodiment, the use of consistent hashing minimizes the keys that haveto be remapped due to the addition of the virtual node.

FIG. 1 is a block diagram of a system for distributing data requestsaccording to one embodiment. In one embodiment, the system includes ahost device 100 coupled to a non-volatile memory (NVM) system 102 over astorage interface bus 104. The host 100 may be a computing deviceincluding, for example, a server, a portable electronic device such as aportable multimedia player (PMP), a personal digital assistant (PDA), ora smart phone, an electronic device such as a computer or ahigh-definition television (HDTV), or an application processor installedin any such electronic devices.

In one embodiment, the host 100 makes I/O requests for writing andreading data to and from the NVM system 102, over the storage interfacebus 104, using a storage interface protocol. The storage interface bus106 may be, for example, a Peripheral Component Interconnect Express(PCIe) bus. The storage interface protocol may be, for example, anon-volatile memory express (NVMe) protocol.

In one embodiment, the NVM system 102 may include a solid state drive(SSD), although embodiments of the present disclosure are not limitedthereto. The NVM system 102 may include a memory controller 106 and oneor more NVM devices 108 a-108 n (collectively referenced as 108). In oneembodiment, the NVM devices 108 include a flash memory device. Thecontroller 106 may be configured to access the NVM devices 108 throughat least one channel. For example, the NVM system 102 may include Nchannels Ch1 to ChN, and the memory controller 106 may be configured toaccess the NVM devices 108 through the N channels Ch1 to ChN.

The controller 106 may also include at least one central processing unit(CPU) (hereinafter referred to as the HCPU 110), a plurality of secondprocessors 112 a-112 n (collectively referenced as 112), host interfacelayer (HIL) 114, flash translation layer (FTL) 116, and internal memory118. The internal memory 118 may include a DRAM (dynamic random accessmemory), SRAM (static random access memory), and/or DTCM (Data TightlyCoupled Memory).

In one embodiment, the HCPU 110 is configured to execute the HIL 114 forprocessing requests from the host 100 such as, for example, read orwrite requests, and generating internal requests for processing by thesecond processors 112. The HIL 114 may be realized as firmware and maybe loaded on the internal memory 118 for being executed by the HCPU 110.

In one embodiment, the second processors 112 are configured to receiverequests from the HCPU 110 and access data stored in the NVM devices 108based on the memory addresses contained in the requests. When thenon-volatile memory devices 108 include a flash memory device, thesecond processors may execute the FTL 116 to interface with the NVMdevices 108. In this regard, the FTL 116 may include system software (orfirmware) for managing writing, reading, and erasing operations of theflash memory device, and may be loaded on the internal memory 118 andoperated by the second processors 112. The second processors 112,referred to herein as FTL CPUs (FCPUs) 112 may be configured to operatethe FLT. In one embodiment, the FTL may include a mapping tableincluding information for converting between a logical address and aphysical address.

In one embodiment, the HIL 114 includes a request distribution engine120 that selects one of the FCPUs 112 for processing a request from thehost 100. In this regard, the request distribution engine 120 firstidentifies a virtual CPU node applying consistent hashing to the memoryaddress included in the request. In one embodiment, the internal memory118 stores a table with a mapping of virtual CPU nodes to FCPUs 112. Inone embodiment, the request distribution engine 120 accesses the tableto identify the FCPU 112 that corresponds to the identified virtual CPU.The request distribution engine 120 may transmit an internal request tothe identified FCPU 112 for processing the request. In this manner,workloads may be distributed to the various FCPUs 112 in a balancedmanner. Furthermore, use of consistent hashing may allow one or morevirtual CPUs and associated FCPUs 112 to be added and/or removed whileminimizing the remapping of key values to the virtual CPUs, andmigration associated migration of data due to the remapping.

FIG. 2A represents a conceptual block diagram of an example hash ring200 used with consistent hashing according to one embodiment.

One or more virtual CPUs (nodes) 202 a-202 c (collectively referenced as202) may be assigned a position on the hash ring 200. In one embodiment,the position of the nodes is determined using a hash function that usesan LBA address as a key. The LBA address that is used as the key may bein an address range that is assigned to (or covered by) the node. Theaddress ranges covered by the nodes may be the same or different fromone another. For example, assuming an approximately 1.2 TB flash memorysystem, the first virtual CPU 202 a may cover approximately LBA 0-400GB, the second virtual CPU 202 b may cover approximately LBA 400-800 GB,and the third virtual CPU 202 c may cover approximately LBA 800-1200 GB.In one embodiment, the ending LBA address in the range is used as thekey for computing the hash value for the virtual CPU 202. In someimplementations, a beginning address in the range may be used in lieu ofthe end address.

The request distribution engine 120 may also assign one or more objects204 a-204 e (collectively referenced as 204), a position on the hashring 200, for selecting a node to which the object is to be assigned.The objects may be, for example, memory addresses or other keys providedin an I/O request by the host 100. In the example of FIGS. 2A-2C, theobjects 204 are LBA addresses.

In one embodiment, the I/O request that triggers the mapping of objectson the hash ring includes a write request. In one embodiment, the samehash function used to determine the positions of the virtual CPUs 202 isused to determine the position of memory addresses in the writerequests. In this regard, in order to determine a position of theobjects 204 on the hash ring, the memory addresses of the objects 204may be provided to the hash function for determining corresponding hashvalues. The objects 204 may then be positioned on the hash ring based onthe corresponding hash values.

In one embodiment, once the memory address of a write request is mappedon the hash ring 200, the request distribution engine 120 invokes anassignment algorithm to assign a virtual CPU 202 to the memory address.The assignment algorithm may cause, for example, selection of a virtualCPU 202 that has a hash value closest to the hash value of the inboundobject, in a clockwise direction (e.g. in a direction where the hashvalues increase) or a counterclockwise direction (e.g. in a directionwhere the hash values decrease) on the hash ring 200.

In the embodiment where the rule is to search in the clockwise directionwith increasing hash values, the example memory address objects 204 ofFIG. 2A may be assigned to the nodes 202 as follows: object 204 b to thefirst virtual CPU 202 a; objects 204 d and 204 e to the second virtualCPU 202 b; and object 204 f to the third virtual CPU 202 c. In theexample of FIG. 2 , object 204 a has a hash value bigger than the hashvalue of the third virtual CPU 202 c, which is the virtual CPU with thehighest hash value. Given that there are no virtual CPUs with a hashvalue bigger than virtual CPU 202 c, object 204 f is also assigned tothe first virtual CPU 202 a.

In one embodiment, the disclosed systems may implement the hash ring viaa data structure that stores the hash values of the virtual CPUs 202 asan ordered list. A list of the virtual CPUs 202 may also be maintainedas a separate list. In one embodiment, the list of hash values may besearched using any searching algorithm (e.g. binary search) to find thefirst hash value (and associated virtual CPU) that is greater than thehash value of the object in the request from the host 100. If no suchvalue is found, the search algorithm may wrap around to find the firsthash value (and associated virtual CPU) in the list.

Once the virtual CPU is selected 202 for an incoming object, the requestdelivery engine 120 may identify an FCPU 112 associated with theselected virtual CPU 202. In one embodiment, the request delivery engine120 searches a table mapping the virtual CPUs 202 to the FCPUs 112. Therequest associated with the incoming object may then be delivered to theappropriate FCPU for processing.

In some situations, it may be desirable to add or delete a node 202 fromthe hash ring 200, after the hash ring has been in use. For example, itmay be desirable to add or remove a node 202 in the event of die failureof an associated FCPUs 112, in order to minimize traffic to the FCPU.

FIG. 2B is a conceptual block diagram of the hash ring 200 of FIG. 2A,that adds a fourth virtual CPU 202 d to the ring. The fourth virtual CPU202 d may cover all or a portion of the address ranges covered by thefirst virtual CPU 202 a. In this regard, the location of the fourthvirtual CPU 202 d may be determined by calculating a hash value of thelast address in the range that is now covered by the fourth virtual CPU202 d. Thus, objects with an address range previously covered by thefirst virtual CPU 202 a, but now covered by the fourth virtual CPU 202d, may be assigned to the fourth virtual CPU 202 d instead of the firstvirtual CPU 202 a. This may help minimize traffic to the FCPU assignedto the first virtual CPU 202 a. The assignments of other objects,however, may remain the same.

FIG. 2C is a conceptual block diagram of the hash ring 200 of FIG. 2A,that removes the third virtual CPU 202 c from the ring. The deletion ofthe third virtual CPU 202 c causes an object 204 e that would have beenassigned to the third virtual CPU 202 c, to now be assigned to the firstvirtual CPU 202 a. The other assignments, however, remain the same.

FIG. 3A is an exemplary mapping table 300 of virtual CPUs 202 to FCPUs(also referred to as physical CPUs) 112 according to one embodiment. Themapping table may be stored, for example, in the internal memory 118. Inone embodiment, the mapping table 300 includes a list of virtual CPU IDs302 mapped to corresponding FCPU IDs 304. The mapping of virtual CPUs toFCPUs may be determined, for example, by an administrator based oncapabilities of the FCPU. For example, the more the power of an FCPU,the more the number of virtual CPUs assigned to the FCPU.

In one embodiment, the list of virtual CPU IDs 302 are further mapped toa list of hash values 306 that identify a position of the virtual CPU202 on the hash ring 200. In one embodiment, the hash values are storedin a data structure as an ordered list. In the event that a new virtualCPU 308 is added to the mapping table 300 as exemplified in FIG. 3B, thehash value of the new virtual CPU is computed and inserted into the list306, in sorted order.

In one embodiment, the internal memory 118 further stores a bitmap thatprovides FCPU IDs for different LBAs. FIG. 4 is a conceptual blockdiagram of an example bitmap 400 according to one embodiment. The numberof bits 402 in the bitmap may depend on the number of FCPUs 112 to beidentified, and the number of LBAs to be mapped to the FCPUs. In theexample of FIG. 4 two bits are used to identify up to four differentFCPUs. Thus, in the example of FIG. 4 , every two bits of the bitmapidentify the FCPU ID assigned to an LBA. Although in the example of FIG.4 , only four LBAs are represented, but it should be appreciated thatthe bitmap may represent all possible LBAs that may be determined basedon the storage capacity of the NVM system 102.

In one embodiment, the request distribution engine 120 updates thebitmap 400 in response to a write request. The write request may includethe LBA associated with a memory location of the NVM 102 where data isto be stored. As described, the request distribution engine 120identifies the virtual CPU 202 to handle the write request, and maps thevirtual CPU 202 to the corresponding FCPU 112. In one embodiment, therequest distribution engine 120 stores the identifier of the FCPU 112 ina portion of the bitmap 400 that corresponds to the LBA of the writerequest.

In one embodiment, the request distribution engine 120 references thebitmap 400 in response to a read request from the host 100. In thisregard, the request distribution engine 120 may search the portion ofthe bitmap corresponding to the LBA of the read request to retrieve thestored FCPU ID. The request distribution engine 120 may then direct theread request to the identified FCPU for handling. As a person of skillin the art should appreciate, the extra processing overhead ofcalculating hash values, determining virtual CPU IDs based on calculatedhash values, and further identifying the FCPU of the virtual CPU, may beavoided by storing the LBA-to-FCPU mapping in the bitmap 400 in theinternal memory 118. In terms of memory space, in the example of anapproximately 1 TB drive with an approximately 4 KB page unit, thebitmap may represent approximately a total 256M addresses, translatingto approximately 64 MB of memory space in the event that drive includesfour FCPUs (256M/8 bit*(2 bit/address)=64 MB).

FIG. 5 is a flow diagram of a process for processing a first accessrequest according to one embodiment. The first access request may be,for example, a write request. The process starts, and at block 500, theHIL 114 receives the write request from the host 100. The write requestmay include an address (e.g. LBA) associated with a memory location ofthe NVM device 108 to where data is to be written.

In response to receipt of the write request, the HCPU 110 invokes therequest distribution engine 120 to identify the FCUP that is to handlethe write request. In this regard, the request distribution engine 120,at block 502, calculates a hash value (h1) based on the address of thewrite request. A hash function, such as an MD5 hash function, may beused to calculate the hash value.

At block 504, the request distribution engine 120 maps h1 on a hash ring(e.g. hash ring 200). The hash ring may be implemented as a sorted listof hash values (also referred to as node values) corresponding to thevirtual CPUs (e.g. virtual CPUs 202), and the mapping of h1 onto thehash ring may entail comparing h1 to the hash values in the list, foridentifying a position of h1 on the list.

At block 506, the request distribution engine 120 invokes a searchingalgorithm to find a closest virtual CPU (node) on the hash ring (e.g.hash ring 200) whose hash value is greater than h1. The searchingalgorithm may be, for example, a binary search algorithm that searchesthe list of hash values until it identifies the first hash value that isequal or greater than h1. The searching algorithm may return the virtualCPU ID of the identified hash value in the list.

In one embodiment, if h1 is greater than any hash value in the list, andthe searching algorithm reaches the end of the list without finding ahash value that is equal or greater than h1, the searching algorithm maywrap around the list and identify the first hash value in the list. Thesearching algorithm may return the virtual CPU ID of the first hashvalue in the list as the virtual CPU that is to handle the request.

At block 508, the request distribution engine 120 searches a mappingtable (e.g. mapping table 300) to identify the FCPU 112 mapped to thereturned virtual CPU ID.

At block 510, the request distribution engine 120 updates a bitmap (e.g.bitmap 400) of FCPU IDs. In this regard, the request distribution engineidentifies a preset number of bits (e.g. two bits) of the bitmapcorresponding to the LBA in the write request, and stores the FCPU ID inthe identified bits of the bitmap.

At block 512, the HIL 114 transmits the request to the FCPU 112 toaccess the memory location in the write request, and cause the NVMdevice 108 to write data in the accessed memory location.

FIG. 6 is a flow diagram of a process for processing a second accessrequest according to one embodiment. The second access request may be,for example, a read request. The process starts, and at block 600, theHIL receives the read request from the host 100. The read request mayinclude an address associated with a memory location of the NVM device108 from where data is to be read. The memory location may be the memorylocation associated with a prior write request.

In response to receipt of the read request, the HCPU 110 invokes therequest distribution engine 120 to identify the FCPU that is to handlethe read request. In this regard, the request distribution engine 120,at block 602, retrieves a bitmap (e.g. bitmap 400) of FCPU IDs.

At bock 604, the request distribution engine 120 locates bits of thebitmap corresponding to the address in the read request, and retrievesthe FCPU ID from the located bits.

At block 606, the HIL 114 transmits the read request to the identifiedFCPU 112 to access the memory location in the read request, and causethe NVM device 108 to read data in the accessed memory location.

FIG. 7 is a flow diagram of a process for adding a virtual node to ahash ring according to one embodiment. The process starts, and at block700, the request distribution engine 120 monitors and identifies atrigger for adding the virtual node. The trigger may be, for example,failure of a memory die of an NVM device 108 controlled by a first FCPUselected from the various FCPUs 112.

At block 702, the request distribution engine 120 identifies a firstvirtual CPU associated with the first FCPU 112 by, for example,referencing the mapping table 300.

At block 704, the request distribution engine 120 determines the addressrange covered by the first virtual CPU.

At block 706, the request distribution engine 120 generates a newvirtual CPU, and at block 708, assigns at least part of the addressrange covered by the first virtual CPU, to the new virtual CPU. Forexample, assuming that the address range of the first virtual CPU is LBA0-400 GB, the new virtual CPU may be assigned LBA 0-300 GB while thefirst virtual CPU retains LBA 300-400, to help reduce traffic to thefirst virtual CPU.

At block 710, the request distribution engine 120 maps the new virtualCPU on the hash ring 200. In this regard, the request distributionengine 120 may compute a hash value for the new virtual CPU based on,for example, the last address in the range of addresses (e.g. LBA 300)covered by the new virtual CPU.

At block 712, the mapping table 300 is updated with information on thenew virtual CPU. For example, the new virtual CPU may be assigned to asecond FCPU different from the first FCPU that is associated with, forexample, the failed die.

At block 714, a determination is made as to whether the data stored inthe memory addresses now covered by the second FCPU should betransferred from the memory die of the NVM device 108 of the first FCPU.For example, data in LBA 0-300 may be migrated from the memory die ofthe NVM device 108 associated with the first FCPU, to the memory die ofthe NVM device associated with the second FCPU.

The data migration need not happen immediately. For example, a conditionmay be monitored to determine whether data migration should beperformed. In one example, the condition may be start of a garbagecollection process. In this regard, at block 716, data may be migratedfrom the memory die of the first FCPU to the second FCPU, during thegarbage collection process.

In one embodiment, the request distribution engine 120 may furtherdelete an existing virtual CPU based on a monitored condition. Forexample, it may be desirable to delete a node when requests to thecorresponding FCPU of the node are more (e.g. 25% more) than otherFCPUs, or there are more garbage collection activities in thecorresponding FCPU than other FCPUs. In one embodiment, the deleting ofa virtual CPU may simply entail removing the virtual CPU ID andassociated hash value, from the mapping table 300 and associated hashlist 306. In one embodiment, when the virtual CPU ID is removed from thehash ring, an adjacent virtual CPU in the clockwise direction assumesthe LBA addresses previously covered by the removed virtual CPU.

It will be understood that, although the terms “first”, “second”,“third”, etc., may be used herein to describe various elements,components, regions, layers and/or sections, these elements, components,regions, layers and/or sections should not be limited by these terms.These terms are only used to distinguish one element, component, region,layer or section from another element, component, region, layer orsection. Thus, a first element, component, region, layer or sectiondiscussed herein could be termed a second element, component, region,layer or section, without departing from the spirit and scope of theinventive concept.

It should also be understood that the sequence of steps of the processesin FIGS. 5-7 are not fixed, but can be modified, changed in order,performed differently, performed sequentially, concurrently, orsimultaneously, or altered into any desired sequence, as recognized by aperson of skill in the art.

As used herein, the singular forms “a” and “an” are intended to includethe plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising”, when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof. As used herein, the term “and/or”includes any and all combinations of one or more of the associatedlisted items. Expressions such as “at least one of,” when preceding alist of elements, modify the entire list of elements and do not modifythe individual elements of the list. Further, the use of “may” whendescribing embodiments of the inventive concept refers to “one or moreembodiments of the present disclosure”. Also, the term “exemplary” isintended to refer to an example or illustration. As used herein, theterms “use,” “using,” and “used” may be considered synonymous with theterms “utilize,” “utilizing,” and “utilized,” respectively.

Although exemplary embodiments of a system and method for workloaddistribution in an NVM system have been specifically described andillustrated herein, many modifications and variations will be apparentto those skilled in the art. Accordingly, it is to be understood that asystem and method for workload distribution in an NVM system constructedaccording to principles of this disclosure may be embodied other than asspecifically described herein. The disclosure is also defined in thefollowing claims, and equivalents thereof.

What is claimed is:
 1. A method for workload distribution in a systemhaving a non-volatile memory device, comprising: receiving a requestincluding an address associated with a memory location of thenon-volatile memory device; calculating a hash value based on theaddress; searching a list of node values; identifying one of the nodevalues in the list based on the hash value; identifying a processorbased on the one of the node values; storing the address in associationwith the processor; and transmitting the request to the processor foraccessing the memory location.
 2. The method of claim 1, wherein therequest includes a request to write data to the memory location.
 3. Themethod of claim 1, wherein the address includes a logical address. 4.The method of claim 1, wherein the identifying of one of the node valuesis based on consistent hashing.
 5. The method of claim 4, wherein thelist of node values represents a consistent hash ring.
 6. The method ofclaim 1, wherein the list of node values is a sorted list, wherein theidentifying of the one of the node values includes identifying a firstone of the node values in the list with a value equal or greater to thehash value.
 7. The method of claim 1, wherein the identifying theprocessor includes searching a table including a mapping of the nodevalues to processor identifiers.
 8. The method of claim 1, wherein aprocessor identifier for the processor is stored in a bitmap inassociation with the address.
 9. The method of claim 1, wherein theprocessor includes an embedded processor, and the memory locationidentifies a location of a flash memory device.
 10. The method of claim1 further comprising: identifying a trigger condition associated withthe processor; identifying a node associated with the processor, whereinthe node is identified via one of the node values; identifying anaddress range covered by the node; associating at least a portion of theaddress range to a second node; and mapping the second node to a secondprocessor different from the processor.
 11. A non-volatile memorysystem, comprising: a non-volatile memory device; a first processorcoupled to the non-volatile memory device; and a second processorcoupled to the first processor, the second processor being configuredto: receive a request including an address associated with a memorylocation of the non-volatile memory device; calculate a hash value basedon the location identifier; map the hash value onto a list of nodevalues; identify one of the node values in the list based on the hashvalue; identify the first processor based on the one of the node values;store the location identifier in association with the first processor;and transmit the request to the first processor for accessing the memorylocation.
 12. The system of claim 11, wherein the request includes arequest to write data to the memory location.
 13. The system of claim11, wherein the location information includes a logical address.
 14. Thesystem of claim 11, wherein the identifying of one of the node values isbased on consistent hashing.
 15. The system of claim 14, wherein thelist of node values represents a consistent hash ring.
 16. The system ofclaim 11, wherein the list of node values is a sorted list, wherein thesecond processor is configured to identify the one of the node valuesinclude instructions that cause the second processor to identify a firstone of the node values in the list with a value equal or greater to thehash value.
 17. The system of claim 11, wherein the second processor isconfigured to identify the first processor include instructions thatcause the second processor to search a table including a mapping of thenode values to first processor identifiers.
 18. The system of claim 11,wherein a processor identifier for the processor is stored in a bitmapin association with the address
 19. The system of claim 11, wherein thefirst processor includes an embedded processor, and the memory locationidentifies a location of a flash memory device.
 20. The system of claim11, wherein the second processor is further configured to: identify atrigger condition associated with the first processor; identify a nodeassociated with the first processor, wherein the node is identified viaone of the node values; identify an address range covered by the node;associate at least a portion of the address range to a second node; andmap the second node to a third processor different from the firstprocessor.